Instructions

On this first assignment, applying the basic functions of the Igraph package is required. The following datasets are going to be used:

You have to complete the code chunks in this document but also analyze the results, extract insights and answer the short questions. Fill the CSV attached with your answers, sometimes just the number is enough, some others just a small sentence or paragraph. Remember to change the header with your email.

In your submission please upload both this document in HTML and the CSV with the solutions.

Loading data

In this section, the goal is loading the datasets given, building the graph and analyzing basics metrics. Include the edge or node attributes you consider.

Describe the values provided by summary function on the graph object.

1) How many nodes are there?

2) How many edges are there?

Degree distribution

Analyse the degree distribution. Compute the total degree distribution.

3) How does this distributions look like?

4) What is the maximum degree?

5) What is the minum degree?

Network Diameter and Average Path Length

You have functions in igraph to calculate the diameter and the average path length. Think if you should consider the weights, the directions, etc.

6) What is the diameter of the graph?

7) What is the avg path length of the graph?

Node importance: Centrality measures

(Optional but recommended): Obtain the distribution of the number of movies made by an actor and the number of genres in which an actor starred in. It may be useful to analyze and discuss the results to be obtained in the following exercises.

Obtain three vectors with the degree, betweeness and closeness for each vertex of the actors' graph.


Obtain the list of the 20 actors with the largest degree centrality. It can be useful to show a list with the degree, the name of the actor, the number of movies, the main genre, and the number of genres in which the actor has participated.

8) Who is the actor with highest degree centrality?

The actor with highest degree centrality: Mark Davis

9) How do you explain the high degree of the top-20 list??

The degree centrality has a clear increase since a lot of actors participate in several moviees. We can find an outliar with Tom Hanks who has a high amount of degrees due to his high level of popularity and is not from the adult movie industry. Adult movies actors are able to act in more movies than hollywood actors since they are shorter and cheaper.

Obtain the list of the 20 actors with the largest betweenness centrality. Show a list with the betweenness, the name of the actor, the number of movies, the main genre, and the number of genres in which the actor has participated.

10) Who is the actor with highest betweenes?

The actor with the highest betweenes: Ron Jeremy

11) How do you explain the high betweenness of the top-20 list?

We can see a correlation betweeen famous actors such as Samuel L. Jackson and Salma Hayek who have high levels of betweeness due to their celebrity profiles and hollywood stars.

Obtain the list of the 20 actors with the largest closeness centrality. Show a list with the closeness the name of the actor, the number of movies, the main genre, and the number of genres in which the actor has participated.

12) Who is the actor with highest closeness centrality?

The actor with the highest closeness centrality is Julian Armanis with 0.714286

13) How do you explain the high closeness of the top-20 list?

We can appreciate how the top 20 actors have strong interactions due to their short average distance to others. Technically, it's shorter for nodes to reach these top 20 actors.

Network Models (Optional)

Explore the Erdös-Renyi model and compare its structural properties to those of real-world networks (actors):

Comunity detection (Optional)

Use any community detection algorithm for the actors' network and discuss whether the communities found make sense according to the vertex labels.